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Abstract 



In the framework of prediction with expert advice, we consider a re- 
cently introduced kind of regret bounds: the bounds that depend on the 
effective instead of nominal number of experts. In contrast to the Normal- 
l_j . Hedge bound, which mainly depends on the effective number of experts 

' and also weakly depends on the nominal one, we obtain a bound that does 

^ , not contain the nominal number of experts at all. We use the defensive 

forecasting method and introduce an application of defensive forecasting 
to multivalued supermartingales. 

> \ 

■ 1 Introduction 

' We consider the problem of prediction with expert advice (PEA) and its variant, 

decision-theoretic online learning (DTOL). In the PEA framework (see [3] for 

■ details, references and historical notes), at each step Learner gets decisions (also 
I called predictions) of several Experts and must make his own decision. Then the 

environment generates an outcome and a (real- valued) loss is calculated for each 
decision as a known function of decision and outcome. The difference between 
.J ■ cumulative losses of Learner and one of Experts is the regret to this Expert. 

rS I Learner aims at minimizing his regret to Experts, for any sequence of Expert 

■ decisions and outcomes. 
In DTOL, introduced in [5], Learner's decision is a probability distribution 

on a finite set of actions. Then each action incurs a loss (the vector of the 
losses can be regarded as the outcome) , and Learner suffers the loss equal to the 
expected loss over all actions (according to the probabilities from his decision). 
The regret is the difference between the cumulative losses of Learner and one of 
the actions. One can interpret each action as a rigid Expert that always suggests 
this action. A precise connection between the DTOL and PEA frameworks will 
be described in Section [21 

Usually Learner is required to have small regret to all Experts. In other 
words, a strategy for Learner must have a guaranteed upper bound on Learner's 
regret to the best Expert (one with the minimal loss). In this paper we deal 
with another kind of bound, recently introduced in [3] . It captures the following 



1 



intuition. Generally speaking, the more Experts (or actions, in the DTOL 
terminology) Learner must take into account, the worse his performance will 
be. However, assume that each Expert has several different names, so Learner 
is given a lot of identical advice. It seems natural that the loss of Learner is big if 
there is a real controversy between Experts (or a real difference between actions) , 
and small if most of the Experts agree with each other. So a competent regret 
bound should depend on the real number of Experts instead of the nominal 
one. Another example: assume that all the actions are different, but many of 
them are good — there arc many ways to achieve some goal. Then Learner has 
less space to make a mistake and to select a bad action. Again it seems that 
a competent regret bound should depend on the fraction of the good actions 
rather than the nominal number of actions. 

If the effective number of actions (Experts) is significantly less than the 
nominal one, one can loosely say that the number of actions is unknown in 
this setting. The following regret bound obtained in [3] for their NormalHedge 
algorithm holds for this case: 



LT<Lfr + Oy^T\n^+\n^Nj, (1) 

where TV is the nominal number of actions, Lt is the cumulative loss of Learner 
after step T and is the value such that at least e-fraction of actions have 
smaller or equal cumulative loss after step T (or can be interpreted as the 
loss of eA^-th best action). It is important that the bound holds uniformly for 
all e and T and the algorithm does not need to know them in advance. The 
number ^ plays the role of the effective number of actions. The bound shows, 
in a sense, that the NormalHedge algorithm can work even if the number of 
actions is not known. 

Our main result (Theorem [9]) is the following bound for a new algorithm: 



Lt < L^T + '^\lT\n- + 7Vt . 



This bound is also uniform in T and e. In contrast to ([T]), our bound does 
not depend on the nominal number of actions, whereas ([IJ contains a term 
O(ln^A^). So it is the first (as far as we know) bound strictly in terms of 
the effective number of actions. Our bound has a simpler structure, but it is 
generally incomparable to the (precise) bound for Normal Hedge from |4] (see 
Subsection 14.21 for discussion of different known bounds). Also our bound can 
be easily adapted to internal regret (see [T2] for definition). We describe the 
application to internal regret in Subsection 14.31 

Our bound is obtained with the help of the defensive forecasting method 
(DF). The DF is based on bounding the growth of some supermartingale (a 
kind of potential function). In [5], the DF was used to obtain bounds of the 
form Lt < cLJ^ + a, where c and a are some constants. For our form of bounds, 
we need a new variation of the DF and a new sort of supermartingales. So we 
introduce the notion of multivalued supermartingale and prove a boundedness 
result for them (Lemmas[5]and[3]). (This result is of certain independent interest: 
for example, it helps to get rid of additional Assumption 3 in Theorem 3 in [5].) 

The paper is organized as follows. In Section [5] we describe the setup of 
prediction with expert advice and of decision-theoretic framework online Icarn- 
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ing, and define the e-quantile regret. In Section |3] we describe the Defensive 
Forecasting Algorithm, define multivalued supermartingales and discuss their 
properties, and introduce supermartingales of a specific form that are based 
on Hoeffding inequality. In Subsection 14.11 we prove two loss bounds on the e- 
quantile regret, and in Subsection 14.21 we compare them with the bound for the 
NormalHedgc algorithm and with other known bounds. In Subsection 14.31 we 
show how these bounds can be transformed into bounds on the internal regret. 
In the last subsection we describe a toy example of an algorithm that guarantees 
bounds for two very different loss functions simultaneously. 

2 Notation and Setup 

Vectors with coordinates pi, . . . ,pn arc denoted by an arrow over the letter: 
p= (pi, . . . ,pn)- For any natural N, by Ajv wc denote the standard simplex in 
K^: Atv = {p <= [0, 1]^ I J2n=iP^i ~ p-q^c denote the scalar product: 



Protocol 1 Decision-theoretic framework for learning 
Lo := 0. 

■= 0, n^l,...,N. 
for i= 1,2,... do 

Learner announces 74 G Ajv- 
Reality announces ujt G [0, 1]^. 
Lt := Lt-i +'^t-t2t- 
Vi := LlLi + LJt,n, n^l,...,N. 
end for 



The decision-theoretic framework for online learning (DTOL) was introduced 
in [S] . DTOL protocol is given as Protocol [1] The Learner has TV available 
actions, and at each step t he must assign probability weights 7t,i, . . . , 7t,Ar to 
these actions. Then each action suffers a loss uJt^n, and Learner's loss is the 
expected loss over all actions according to the weights he assigned. Learner's 
goal is to keep small his regret i?" ^ Lt — to any action n, independent of 
the losses. 



Protocol 2 Prediction with Expert Advice 
Lo := 0. 

:= 0, n = l,...,N. 
for i= 1,2,... do 

Expert n announces 7" G F, rt = 1, . . . , A'^. 
Learner announces 74 G F. 
Reality announces Wt G fJ. 
Lt := Lt-i + A(7t,a;t). 

LI-=LILi + A(7r,c^t), " = 
end for 



DTOL can be regarded as a special case of prediction with expert advice 
(PEA), as explained below. The PEA protocol is given as Protocol [2] The 
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game is specified by the set of outcomes fl, the set of decisions F and the loss 
function A : F x — >■ M. The game is played repeatedly by Learner having 
access to decisions made by a pool of Experts. At each step, Learner is given N 
Experts' decisions and is required to come out with his own decision. The loss 
A(7, Lo) measures the discrepancy between the decision 7 and the outcome a;. Lt 
is Learner's cumulative loss over the first t steps, and L" is the n-th Expert's 
cumulative loss over the first t steps. The goal of Learner is the same: to keep 
small his regret i?" = Lt~ to any Expert n, independent of Experts' moves 
and the outcomes. 

As defined in [4] (for DTOL), the regret to the top e-quantile (at step T) is 
the value such that there are at least eN actions (the fraction at least e of 
all Experts) with i?^ > Rfp. Or, equivalently, = Lt — Lfp where Lfp is a 
value such that at least eA^ actions (the fraction at least e of all Experts) has 
the loss less than Lip. 

A uniform bound on R^ (in other words, a bound on Learner's loss Lt in 
terms of Lij.) that holds for all e is more general than the standard best Expert 
bounds. The latter can be obtained as a special case for e = 1/A''. For this 
reason, it is natural to call the value 1/e the effective number of actions: a 
bound on Rfp can be considered as the best Expert bound in an imaginary 
game against 1/e Experts. 

Let us say what games (ri,F,A) we consider in this paper. For any game 
(r2,F,A), we call A = {.g e M*'^ | 37 e FVw £ O ,9(0;) = A(7,w)} the prediction 
set. The prediction set captures most of the information about the game. The 
prediction set is assumed to be non-empty. In this paper, we consider hounded 
convex compact games only. This means that we assume that the set A is 
bounded and compact, and the superprediction set A + [0, 00]^ is convex, that 
is, for any gi, . . . , € A and for any pi,...,pK e [0, 1]^, Y.k=i Pk = 1> there 
exists .9 G A such that g{uj) < 'Yli^=iPkgk{'^) for all w G O. For such games, we 
assume without loss of generality that A C [0, 1]^ (we always can scale the loss 
function) . 

For DTOL as a special case of PEA, the outcome space is = [0, 1]^, the 
decision space is F = Ajv, and the loss function is A(7, u) =7-0;. Experts play 
fixed strategies always choosing 7" such that 7"^ = 1 and 7"^, = for fc 7^ n 
(see e.g. [T31 Example 7] for more details about this game). 

In an important sense the general PEA protocol for the bounded convex 
games is equivalent to DTOL. Obviously, if some upper bound on regret is 
achievable in any PEA game then it is achievable in the special case of the 
DTOL game. To see how to transfer an upper bound from DTOL to a PEA 
game, let us interpret the decisions 7" of Experts and the outcome wt in the 
PEA game as the outcome l6[ in DTOL: „ = A(7", w^). If Learner's decision 

7t satisfies A(7t,Wt) < "Y^^^il't n^ilt -^^t)^ where 7^ is Learner's decision in 
DTOL, then the regret (at step t) in the PEA game will be not greater than 
the regret in DTOL. It remains to note that, since the game is convex, for any 
7j there exists 74 such that A(7t,w) < Y^n=i^t.n^{lt ^^)^ w S ^2. 

However, the equivalence between DTOL and PEA is limited. In particular, 
we can obtain PEA bounds that hold for specific loss functions or classes of loss 
functions (such as mixable loss functions |13p. and these bounds may be much 
stronger than the general bounds induced by DTOL. 

In this paper, we consider PEA and DTOL in parallel for another reason. 
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It is sometimes useful to consider a more general variant of Protocol [5] where 
the number of Experts is infinite (and maybe uncountably infinite): then PEA 
can be applied to large families of functions as Experts. With the help of our 
method, we can cope either with DTOL, where the number of actions is finite, 
or with PEA when f2 is finite and the number of Experts is arbitrary. So we 
cannot infer a bound for infinitely many Experts from a DTOL result, but we 
can obtain a PEA result directly. In the sequel, we will write about N experts, 
but always allow N to be infinite in the PEA case. 

Most of the presentation below is in the terms of PEA but applicable to 
DTOL as well. We normally hide the difference between PEA and DTOL behind 
the common notation (DTOL is considered as the game described above). When 
the difference is important, we give two parallel fragments of a statement or 
proof. 

3 Defensive Forecasting and Supermartingales 

This section contains the technical results we need to construct our prediction 
algorithm. They are used in the proofs but not in the theorem statements and 
discussions in the next section. 

3.1 Defensive Forecasting 

The general structure of the Defensive Forecasting Algorithm (DFA) is quite 
simple. At step t, we define a function /t : F x — J- M (with special properties — 
see below) and look for 7 e F such that 

yujen ft{j,oj)<ft-iht-u^t-i), (2) 

where ft-i is the function defined at the previous step, jt-i is Learner's decision 
at the previous step, and uJt-i is the outcome at the previous step. Then 7 with 
this property is announced as the next decision of Learner 74 . 

The choice of ft may depend on all the previous decisions, outcomes, and on 
this step Experts' decisions (for PEA), so ft = J'({75"}^^i, 71, wi, . . . , {lt}n=i)- 
Having specified we call this strategy of Learner an application of the DFA 
to T. 

The algorithm guarantees that the values of ft do not increase, in particular, 
after each step the value ft{'yt,'^t) is not greater than some initial value /q. We 
will choose T so that the inequality ^"({7"}^^^, 71, wi, . . . , {7"}^^j)(7t, wt) < 
/o implies a loss bound we need. 

Also we need to guarantee that the algorithm always can find 7 satisfying ^ . 
To this end we will choose T so that the sequence ft will be a (multivalued) 
supcrmartingale as defined in the next subsection. 

3.2 Multivalued Supermartingales 

Let 17 be a compact metric space. Any finite set fl is considered as a metric 
space with the discrete metric. Let 'P{^1) be the space of all measures on 
supplied with the weak topology. 
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For any measurable function g <E Mp and any n G 'P{fl), denote 




g{uj)TT{duj) . 



For finite ft, this definition reduces to the scalar product: 



Let S be an operator that to any sequence ei, tti, wi, . . . , e^-i, ttt-i, t^T-i, gt, 
where € fi, ttj € t = 1, . . . ,T — 1, and e*, t = 1, . . . , T are some arbi- 

trary values, assigns a function St '■ 'Pi^) — >■ R^- To simplify notation, we will 
hide the dependence of St on all the long argument sequence in the index T. 
We call S a (game-theoretic) supermartingale if for any sequence of arguments, 
for any tt e 'P(f^), for gT-i = 'S't-i(7I't-i) and for g = St{t^) it holds 



This definition of supermartingale is equivalent to the one given in [5]- We 
say that supermartingale S is forecast-continuous if every St is a continuous 
function. 

The main property of forecast-continuous supermartingales that makes them 
useful in our context is given by Lemma [TJ Originally, a variant of the lemma 
was obtained by Leonid Levin in f 976. The proof is based on fixed-point con- 
siderations, see [ini Theorem 16.1] or Lemma 8] for details. 

Lemma 1. Let be a compact metric space. Let a function q : V{fl) x — > K 
be continuous as function from V{il) to M^. If for all tt € T'{^1) it holds that 

E.g(7r,.)<C, 

where C G M is some constant, then 

3n erin)yuj en g(7r,w)<C. 

The lemma guarantees that for any forecast-continuous supermartingale S 
we can always choose gt € St such that (?t(w) < gt-i{(^t-i) for all cj. This is 
exactly the kind of condition we need for the DFA. 

Unfortunately, for the loss bounds we want to obtain, we did not find a suit- 
able forecast-continuous supermartingale. So we define a more general notion 
of multivalued supermartingale, and prove an appropriate variant of Levin's 
lemma. 

To get the definition of a multivalued supermartingale, we make just three 
changes in the definition of supermartingale above: operator S depends ad- 
ditionally on gt € St{TTt)', St is function from V{Q) to non-empty subsets of 
M^; the condition ^ holds for any g G 6*7- (tt). Namely, let S be an oper- 
ator that to any sequence ei, tti, gi, wi, . . . , ey-i, ttt-i, 5t-i, "^T-i, gt, where 
ujt € n, TTt € Vin), gt G t = 1, . . . ,r - 1, and et, t = 1, . . . ,T are some 
arbitrary values, assigns a function St ■ ^(^^) ^ S'^ such that StIt^) is a non- 
empty subset of for all tt G V{fl). S is called a multivalued supermartingale 



EttS < 5T-l(t^T-l) • 



(3) 
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if for any sequence of arguments where gt € St{nt), for any tt G Vifl), SriT^) 7^ 
and for all g G St{t^) it holds 

EttC/ < .gT-i('^T-i) • (4) 

A multivalued supermartingale is called forecast- continuous if for every St, 
the set {(tt,*;) | tt € 'P{Vl)^g € St(j^)} is closed and additionally for every 
TT e V{n) the set 5'r(7r) + [0,oo]" = {g e R*^ | 3g' e 5T(7r)Vw G < g{oj)} 

is convex. 

Note that if S* is a forecast-continuous multivalued supermartingale and 
Stiir) always consists of exactly one element, S is (equivalent to) a forecast- 
continuous supermartingale in the former sense: closedness of the graph of St 
means that S'T(7r) is a continuous function of tt and the convexity requirement 
becomes trivial. 

3.3 Levin's Lemma for Multivalued Supermartingales 

Here we prove two version of Levin's lemma suitable for multivalued super- 
martingales. The first variant (it is simpler) will be used for PEA with finite 
outcome set il. The second variant will be used for DTOL. 

Lemma 2. Let O be a finite set. Let X be a compact subset of R^^. Let 
q C V{fl) X X be a relation. Denote q{Ti) — {g \ {TT,g) G q} and rang — 
^TT£V{n)Qi'^) ^ ^ ■ Suppose that q is closed, for every tt G 'P{^1) the set q^n) is 
non-empty and the set q{Tr) -f [0, 00]^ is convex. If for some real constant C it 
holds that for every tt G V(fl) 

Wgeqin) E^g<C, 

then there exists g G ran q such that 

yujefl g{uj)<C. 

We derive the lemma from Lemma[T]similarly to the derivation of Kakutani's 
fixed point theorem for multi- valued mappings (see, e. g. [U Theorem 11.9]) from 
Brouwer's fixed point theorem. Unfortunately, we did not find a way just to 
refer to Kakutani's theorem and have to repeat the whole construction with 
appropriate changes. 

Proof. Note first that 7^(51) is compact for finite fl, hence q is compact as a 
closed subset of a compact set. Let Mq = maxggiang.ojea 15(1^) I- 

For every natural m > 0, let us take any (l/m)-nct {tt™} on 7^(17) such that 
for every tt G there is at least one net element tt™ at the distance less than 

1/m from tt and for every tt G 'P(fi) there are at most 4|17p elements of the 
net at the distances less than 1/m from tt. (One can use here any reasonable 
distance on V{fl), for example, the maximum absolute value of the coordinates 
of the difference.) For every tt™ in the net, fix any G ^(Tr™) (recall that 
^(Tr™) is non-empty). 

Now let us define a function g"' : Vi^l) x il — > R as a linear interpolation of the 
points (tt™,!?™). Namely, let {u™} be a partition of unity oiV{il) subordinate 
to [/i/m(7r™), the (l/m)-neighborhoods of TT™ (that is, w™(7r) are non-negative. 



7 



u'^{tt) = if the distance between tt and tt™ is 1/to or more, and the sum over 
k of all <"(7r) is 1 at any tt). Let g™(7r,cj) = EfeW"W.9r(^)- 

The function is forecast-continuous. Let us find an upper bound on 

fc 

= ur(7r)E^..,9r + E <W E ('^(^) - <(^)).9r(^) < + M,|f2|/m 

fc" fc cjGO 

(the bound on the first term holds since € 9 (""IT) ^'^'^ hence ETr^f;™ < C). 
By Lemma [1] we can find a point tt™ S such that 

Vcjef7 q"\TT"^,uj) <C + Mq\n\/m. 

Recalling that q"'{7T"\uj) = J^k ■""('r")5r(^) and that there are at most 4|f)p 
non-zero values among M™(7r™), we get the following statement. There exist 
some a™ > 0, fc = 1, . . . , 4|rj|2, J^k "I" 1. and some g]^ S qiTr"^) with tt™ at 
the distance at most 1 /m from tt™ such that 

Wuen E "rCl^) < c* + Mq\n\/m . (5) 

fc=l 

Since V{fl) is compact, we can find a limit point tt* of tt™. It will be a 
limit point of tt™ as well. Since q is compact, we can find G q{i^*) such that 
{■K* ,g*f.) are limit points of (Tr™,^™) for each fc. Finally, since ^^({l, . . . ,4|f2p}) 
is compact, we can find limit points aj! (corresponding to the points g^). 

Taking the limits as m — > oo over the convergent subsequences in ([5]), we get 

4|a|= 

Vc^ef^ E"^^5fcM<C. 

fc=i 

Since g(7r*) 4- [0, oo]^ is convex, the convex combination X!fe=i '^kdt belongs to 
g(7r*)-|-[0, cxj]^. In other words, the combination is minorized by some g* € q{T^*) 
and 

g*{i^) < E "feffc(w) < C* 
fc=i 

for aU cj e fi. □ 

Now let us prove a variant of the lemma suitable for the DTOL framework, 
where the set of outcomes is infinite. Here we make a strong assumption: the 
supcrmartingale values Sxi'n') depend on tt in a very limited way: just on the 
mean of tt. 

Lemmas. Let Q be [0 , 1]^ . Let X be a compact subset of . Letq C'P{Q)xX 
be a relation. Denote g(7r) = {g \ {Tr,g) G q} and rang = U7rep(a)(?(7r) C X. 
Assume that if J LUTri{duj) = J uj'!r2{duj) then q^TTi) = q{TT2)- Suppose that q is 
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closed, for every tt G ^{fl) the set q{n) is non-empty and the set q{n) + [0, oo]^ 
is convex. If for some real constant C it holds that for every tt G V{n) 

Vg e qin) E^g < C , 

then there exists g € ran q such that 

e O g{io)<C. 

Proof. Since [0, 1]^ is a compact metric space, the space ^([0, 1]''^) with weak 
topology is compact too (see, e.g. [TOl Prop. B.28]). Hence q is compact as a 
closed subset of a compact set. Let Mg = maxggranq.wGfi 

We consider as a metric space with Wasserstein distance VF(7r, tt') = 

supj IEtt/ — E^'/l, where the supremum is taken over 1-Lipschitz functions 
(see [ini Def. B.20]). For every natural m > 0, let us construct a (l/m)-net 
{tt™} on V{fl) with the following property. Let wj" be a (l/(2m))-net on il 
such that at most 4N'^ its elements are at the distance less than l/(2m) from 
any oj E fl. For any tt™ there exists a;™ such that / ujiT^{duj) = w,™. (A net 
with this property exists: note that for any tt there is a tt' at the distance 
at most l/(2m) such that / uj7T'{duj) = wf*; it remains to consider a cover of 
l/(2m)-ncighborhoods centered in all tt with the given expected values.) For 
every tt™ in the net, let us take any 5™ G Qi^T)- 

Now let us define a function g™ : V{n)xn — J- M as a linear interpolation of the 
points (7r^,5^). Namely, let {u^} be a partition of unity of P{n,) subordinate 
to Ui/m{TTj^), the (l/TO)-neighbor hoods of tt™ (that is, u'^{tt) are non-negative, 
u™{tt) = if the distance between tt and tt™ is 1/to or more, and the sum over 
k of all u'^iTT) is 1 at any tt). Let g"(7r,cj) = Efc""W.9r(^)- 

The function g™ is forecast-continuous. Let us find an upper bound on 
E.q'"(7r,.): 

E.g™(^,-)=E""WE-.9r 

k 

<C + Mq/m 

(the bound on the first term holds since £ 9('''™) ^^"^ hence E^mf^™ < C*). 
By Lemma [1] we can find a point tt™ G 'P(f^) such that 

Vw e f7 g™(7r™, cj) < C + A/^/m . 

Among TT™ such that u™(7r'") is non-zero, there are at most AN'^ different ex- 
pected values. Let us group all corresponding to tt™ with a certain expected 
value. They belong to the same set g(7r™), thus their convex combination is 
minorized by another element 5™ of the same set. Thus we arrive at the follow- 
ing statement: there are some a™ > 0, i = 1, . . . , 4Af^, a™ — 1, and some 
e q{TT™) with TT™ at the distance at most 1/to from tt™ such that 

Va; G E < C + ^^9/"^ • (6) 

The rest of the proof is the same as in Lemma [5] □ 
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3.4 Hoeffding Supermartingale 

Here we introduce a specific multivalued supermartingale, or rather a family of 
supcrmartingales, that will be used for our main results. 

For technical convenience, our definition of supermartingale St consists of 
two parts: a function G : V{fl) — > 2^", which assigns a set of decisions G(7r) C T 
to every tt G 'P{fl), and a function ft'.Txn^M.. The values of 5*4 are defined 
by the formula: 

Stin) = {ge R'' I 37 e G{n)Vio e O ^(c.) = Mj, oj)} . (7) 

The part G{tt) depends on the game {ft, F, A) only and does not change from 
step to step: 

G{Tr) = argminE,A(7, •) = {7 e T | Vy G FE,A(7, •) < E,A(y, •)} • (8) 

Lemma 4. Let (ri,F, A) be a game such that its prediction set A = {g & MP \ 
37 G FVoj G Q g{uj) = A(7,a;)} is a non-empty compact subset of Mp and 
A + [0, 00]^ is convex. Then the set 

Ga = {(7^,3) e 'Pi^) X A I 37 G G(7r)Vcj G O g{uj) = A(7,lj)} 

is closed and for every tt G V{fl) the sets G{tt) and G{^{t:) = {g \ {T^,g) G Ga} 
are non-empty and the sets Gf^{'K) + [0, 00]^^ are convex. 

Proof. Since A is non-empty and compact, the minimum of Ett.? is attained for 
every tt, and hence G(7r) and also GA(7r) is non-empty. 

Assume that , 32 G GA(7r) C A and a G [0,1]. Then q;5i + (1— a),g2 > 5 G A 
since A -I- [0,c»]^ is convex, and E^g < E7r(Q;gi + (1 — a.)g2) = EttPi = E7r(?2- 
Hence g G G^,{tt) and thus GaCtt) -|- [0, 00]^ is convex. 

It remains to show that Ga is closed. Let gi G GA(7ri) and {ni,gi) converges 
to {n,g); we need to show that g G Ga{t^)- Indeed, 5 G A since A is compact 
and gi g. Hence g = A(7, •) for some 7 G F. To show that 7 G G(7r), let us 
take any 7' G F and check that E^g < E^g', where g' = X{j' , •). Clearly, E^^g' 
converges to Fj^^g' since tt^ tt. Also E^.g, converges to E^g. Then for any 
e > we can find sufficiently large i so that Ej^g < E^. g^-l-e and Ej^.g' < E^rg'-l-e. 
We have E^^-gi < E^.g' since gi G Lijri). These three inequalities imply that 
Eir5 < Ett.?' + 2e. Since e is arbitrary, we have E^g < E^rff'. □ 

Note that for convex bounded compact games the conditions of the lemma 
are satisfied by definition. For DTOL, the set A = {5 G RI^'^I" | 3p e An^lS G 
[0, 1]^ gi^) = P • ^} is obviously non-empty and it is compact and convex as a 
linear image of simplex A at. 

Now consider a function H : T x ^ W oi the form 

if (7, w) = eniM^,'-)-Hf'.^}}-vV2 ^ (g) 
where parameter 7' G F and > 0. 

Lemma 5. Let (f2, F, A) be a game, the range of X is included in [0, 1] and G{tt) 
is defined by ([8]). Then for all 7' G F, for all 77 < 0, for all tt G V{fl), and for 
all 7 G G{tt) it holds 
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Proof. Since A(7,aj) — A(7',cj) G [—1,1] for any 7, 7' and w, the Hoeffding 
inequality (see e.g. [31 Lemma A.l]) implies that 



E c'j(A(7r)-A(7'r)) < c'/Ex(A(7.)-A(7',.))+r,V2 ^ 



It remains to note that E7r(A(7, •) — A(7', •)) < by definition of G{tt). 
Now we can explain what ft will be used in ([7]): 



□ 



K 



(10) 



where pt,fc > are some weights and Ht^k are functions of the form ^ with 
some parameters -qt^k and 74,^, cf. (HH), ((151), (HH), ([21]), (22), and The 
sum may be infinite or it can be even an integral over some measure pt{k). As 
in the definition of supermartingale, the index t may hide the dependence on a 
long sequence of arguments. 

Lemma 6. St defined by ([7]), ([8]), and (jlOp satisfies the conditions of Lemma\^ 
if (r2,r, A) is a hounded convex compact game with finite f2 or the conditions of 
Lemma0 i/ (r2, r, A) is DTOL, where S't(7r) is taken for q^?:) and^f.^^pt.k is 
taken for C . 

Proof. If 5 G Stij^) then g = /t(7, •) for some 7 G G{tt). Thus we have E^^ = 

T.k=iPt,kE^Ht,k{l, ■) < J2k=iPt.k by Lemma El 

Clearly, 5*4 (tt) C A and A is compact, as remarked after Lemma HI The set 
^((Tr) is non-empty since G^tt) is non-empty by Lemma HI 

Let (ptig) = Ef=iPt.feE,re'"-'=^»-^''^*.'--)^-"'.'=/^ Note that g G Ga{tt) if 
and only if (j){g) G 5*4 (tt). Note also that (f)t is a convex (and hence continuous) 
function of g. Thus, the graph of St is closed since Ga closed and 54(7r)-|-[0, 00]^ 
is convex since Ga{'k) -\- [0,oo]^^ is convex. 

The condition 54(711) = St{'n2) when J LUTri{dLu) = J ujT:2{duj) for DTOL 
follows from the equality E^A(7, w) = E^(7 • cD) = 7 • E-^lo. □ 

4 Loss Bounds 

In this section, we consider applications of the supermartingale technique to 
obtaining the loss bounds in several different settings. Let us begin with a 
simple theorem that shows a clean application of the DEA. 

Theorem 7. // T is known in advance then the DFA achieves the hound 



Lt < minL?; + V2TlniV 



n 



(for DTOL with N actions as well as for PEA with N experts). 



Proof. Let = ^2{\-aN)/T and 
^ 1 

f,(7 w) = V _c"(it-l-i?_i)-r,V2 X Qr,(X{i,uj)-\{i^,u:))-ri^-/2 




(11) 
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At each step t, the DFA finds jt such that ft{jt,i^) < ft-i{jt-i,i^t-i) for all 
w € ri. Such a 7t exists due to Lemma [5] combined with Lemma [3] for DTOL or 
Lemma[2]for PEA. Clearly, /t_i(7t_i, cjt_i) = Y.n=i 7} ^^viviLt-i - ^-i) - 
77^/2), and we get that the DFA applied to {ft} guarantees that 

n=l 

Bounding the sum from below by one additive term, we get the bound. □ 

This bound is twice as large as the best bound obtained in [3] (see their 
Theorems 2.2 and 3.7). Our bound is the same as that in Corollary 2.2 in [3]. 



4.1 Bounds on e-Quantile Regret 

The bound in Theorem[7]is guaranteed only once, at step T specified in advance. 
The next bound is uniform, that is, holds for any T, and it holds for e-quantile 
regret for all e > 0. 

Theorem 8. For DTOL with N actions, the DFA achieves the bound 



1/e dri ^1 

for any T and any e > 0, where is a value such that at least e- fraction of 
actions has the loss after step T not greater than Lfp . In particular, (jl2p implies 
for any 5 G (0, 1/4) 



Lt < r.%,+ " - WT1n - + irini + 2rinlnr + maxi4,4001ni J> , (13) 
which can be further reduced to 



Rt < ( 1+1^) \/2Tlni + 5rinlnT + (ini ) . (14) 



e 



The bound holds also for PEA; if each of finitely or infinitely many Experts is 
assigned some positive weight pn, the sum of all weights being not greater than 
1, the DFA achieves (|12[) - (jl4p with Lfp being a value such that the total weight 
of Experts that have the loss after step T not greater than L^ is at least e. 

Proof. We mix all the supermartingales used in (jlip over G [0, 1/e] according 
to the probability measure 

M*y) = — 77G[0,l/e]. 

We apply the DFA (that is, at each step t, find 7f such that ftiji,!^) < 
/t_i(7t_i, W(_i) for ah io € fl) to 



/t(7,a;)=V — / ^v(Lt^i-L^_,)-v^/2 ^r,(Xh,iu)-M'<r',^))-vV2 



ri=i ^ "'o rWlni 



(15) 
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(for PEA with weighed Experts, the term 1/N is replaced by p„) and achieve 
Jt^ITt'^t) 1 for ah T. Bounding the sum from below by the sum of terms 
where < L^, we get 



l/e 



^v(Lt-L'j,)-Tv^/2_ 



< 



(16) 



Let us estimate the integral. Notice that the exponent Rr] — Tif'/2 is positive 
when < 77 < 2R/T and attains its maximum E? /{2T) at the mid-point of this 
interval, 77 = R/T . Solving the quadratic inequality 



i?77 - Tr]'^/2 > (1/2 - &)R^/T 



gives 



77G 



1 - V25 



R 



(0 < (5 < 1/2) and so (dH) implies 



25) - ln(l - V26) 



iMT/R) - ln(l + V2(5))(ln(T/i?) - ln(l - V26)) 



1 

< - 

e 



when (^1 + \/2^^ i?/T < 1/c. If the last condition does not hold and hence R is 

close to T, one can get from p6|) that T < 4001n(l/e). Assuming S < 1/4, we 
can obtain 



g(2-5)i?VT < 



1 



eV2(5 



1 2 4T 



For i? > 4, we further obtain 



(2 - 6)R'^/T < In 



1 



-l4 

2 (5 



21nlnT, 



which finally leads to (US]). Substituting (5 = l/lnT, we get ^i)) . 



□ 



The bound (|14p is not optimal asymptotically in T: it grows as 0(-\/TlnlnT) 
as r — > 00, instead of 0(-\/r). The next theorem gives an asymptotically 
optimal bound but using a "fake" DFA. 

Theorem 9. For DTOL with N actions, there exists a strategy that achieves 
the bound 



Lt <L%. + 2\ Tin 



1 



(17) 



for any T and any e, where Lfp is a value such that at least e-fraction of actions 
has the loss after step T not greater than L^. 

The bound holds also for PEA; if each of finitely or infinitely many Experts is 
assigned some positive weight p„ , the sum of all weights being not greater than 
1, the strategy achieves (|17p with being a value such that the total weight of 
Experts that have the loss after step T not greater than is at least e. 
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Proof. The algorithm in this theorem is not the DFA and does not use super- 
martingales properly: we use values ft(^t,^t) that may increase at some steps 
and /((7(,wt) < /t_i(7t_i, does not hold. Nevertheless, the increases of 
ft stay bounded so that it always holds ft{jt,i^t) < 1- 

Let 1/c = X^fci ^- step T, our algorithm finds ■jt such that /t(7t, w) < 
Ct for all w, where 



X e(VVT)(A(7,")-A(7?,"))-(VVT)'/2 (^g) 



n—1 i—1 



and 

AT oo 

71 = 1 Z=l 

For PEA with weighed experts, it is sufficient to replace 1/A^ by p„ in the 
definitions of fx and Ct- 

Note that fr has the form (jlO|) . hence Lemma[6]applies, and due to Lemma[3] 
or Lemma [2] such a 7t exists. 

Let us prove by induction over T that Ct < 1- It is trivial for T = 0, since 
Lq = Lq = and X]t=i ^ 0. Assume that Ct < 1 and prove that Ct+i < 1- 
By the choice of 7t, we know that fT{lT, ^t) < Ct < 1- Since the function 
is concave for < a < 1, we have 



1 > (/t(7t, wt))'^^'^^ 



/ Af oo \ 

\n=l ^ i=l * / 



n=l i—l 



Now it is easy to get the loss bound. Assume that for an e-fraction of 
Experts their losses are smaller than or equal to L^. Then /t(7t,wt) can 
be bounded from below by 

oo 

^^{i/VT){LT-L'^)-{^/2VT) ELi(''/v^) 
j2 

i=l 



Further, bounding the infinite sum by one of the terms, we get 

g(»/x/T)(LT-L^)-(»/2\/T)i:Li(«/Vt) <l-, 

e c 

Taking the logarithm, using X^tLi (l/v^) — 2\/T' and rearranging the terms, 
we get 



Lt <L'rp + — [ + ln- + 21ni + ln(l/c) 
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■\/ln(l/e) + 1 and using the estimates i < ^y\n{l/e) + 2, 1/i < 1, 



Letting i 

(lni)/i < 2, (ln(l/e))/i < ^ln(l/e), and ln(l/c) = ln(7r2/6) < 1, we obtain the 
final bound. □ 



Remark 1. For DTOL and for PEA with the finite number of Experts, the 

infinite sum over i in the proof can be replaced by the sum up to y'lnTV) + 1. 

However, one should keep decreasing weights c/i^: for uniform weights the 
bound will have an additional term of the form 0((lnln Af)/ln(l/e)). 



Remark 2. Probably, the first bound for e-quantile regret was stated (implic- 
itly) in [S]. More precisely, that paper considered even more general regret 
notion: Theorem 1 in [3] gives a bound for PEA with weighed experts under 
the logarithmic loss of the form 

N N 

Lt < ^ u„Lt + ^ Un In — 

n—1 n—1 

for any u G A^r; pi, . . . ,pn are weights of Experts. Here p„ are known to the 
algorithm in advance, whereas u„ are not known and the bound holds uniformly 
for all Un- Taking u„ = for Experts not from the e-quantile of the best Experts, 
and uniform u„ over Experts from the e-quantile, we get the bound in terms of 
Lfp. It can be easily checked that the strategy in Theorem IH] also achieves the 
following bound: 



N 




for any u G and any T. In Theorem [5] one can replace Lip by X]n=i '^"^t 
and ln(l/e) by Yln=i ln("n/Pri) as well. 

Remark 3. Theorem IH] can be also adapted to discounted regrets of the form 
Lt = ^ ct)"^^* ^{'lt,<^t) for a known a. Then e in the bound is replaced 

by a, and L^ by L^ = ^^^^(1 - a)^-*A(7r, w*)- 

4.2 Discussion of the Bounds 

For a game with N Experts, the best bound, uniform in T, is given by [2J 
Theorem 2.3]: 

In AT 



Lt < Uj. + V2TlnN + ^ — . (19) 

The bomids and with e = l/Aajc always worse than (|19p . In the 
bound (fT7|) the leading coefficient at vThT/V is a/2 times as much. In the 
bound the coefficient at VilnTV is the same, but the other terms are 
larger, and even the asymptotics is worse when N is fixed and T — > oo. 

However, it appears that the bound (|19p cannot be transferred to e-quantile 
regret = Lt — L^. The proof of Theorem 2.3 in |3| heavily relies on tracking 
the loss of only one best Expert, and it is unclear whether the existence of several 
good (or identical) Experts can be exploited in this proof. The experiments 
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reported in [3] show that algorithms with good best Expert bounds may have 
rather bad performance when the nominal number of Experts is much greater 
than the effective number of Experts. 

The first (and the only, as far as we know) bound specifically formulated for 
e-quantile regret is proven for the NormalHedge algorithm in |4l Theorem 1]: 

Lr<Lf, + ^(l + In i) (3(1 + mT + + In A^)) , (20) 

which holds uniformly for all S € (0, 1/2]. Note that this bound depends on the 
effective number of actions 1/e and at the same time on the nominal number 
of actions N. The latter dependence is weak, but probably prevents the use of 
NormalHedge with infinitely many Experts. 

The main advantage of our bounds in Theorems |S] and M is that they are 
perfectly in terms of the effective number of Experts. In a sense, the DFA does 
not need to know in advance the number of Experts. 

Remark 4. To obtain a precise statement about the unknown number of Ex- 
pert, one can consider the setting where Experts may come at some later steps; 
the regret to a late Expert is accumulated over the steps after his coming — it is 
a simple time selection function (see Subsection I4.3p . which switches from to 
1 only once. Our algorithms and bounds can be easily adapted for this setting: 
we must consider infinitely many Experts almost all of which are inactive; and 
then proceed similarly to Theorem 1111 

Both our bounds are worse than (PH)) asymptotically when e and N arc fixed 
and T 00. In this case, the regret term in (j20p grows as a/BT ln(l/e) + 3T, 
whereas in (|T7)) it grows as y^AT ln(l/e) + iVt and in (fT4)) . the worst bound, 
it grows as ^STlnlnT + 2rin(l/e). 

On the other hand, our bounds are better when T is relatively small. The 
term InlnT is small for any reasonable practical application (e.g., InlnT < 4 if 
T is the age of the universe expressed in microseconds), and then the main term 
in (HH) is y/2T ln(l/e), which even fits the optimal bound (HH). Bound (H?]) 
improves over ^ for T < 10*^ In" TV. 

Now let us say a few words about known algorithms for which an e-quantile 
regret bounds were not formulated explicitly, but can easily be obtained. 

The Weighted Average Algorithm, which is used to obtain bound can 
be analysed in a manner different from [31 Theorem 2.3], see [TT]. Then one can 
obtain the following bound for e-quantile regret: 

Lt < Ly-F -\/r In i +c\/T, 
c e 

where the constant c > is arbitrary but must be fixed in advance. If e is not 
known and hence c cannot be adapted to e, the leading term is 0{VT\n -), 
which is worse than (flT)) for small e (that is, if we consider a large effective 
number of actions). 

For the Aggregating Algorithm [T3] (which can be considered as a special 
case of the DFA for a certain supermartingale, as shown in [5]), the bound can 
be trivially adapted to e-quantilc regret: 

Lt < cLfp + a , 
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where the possible constants c > 1 and a depend on the loss function. However, 
in the case of DTOL or arbitrary convex games, the constant c is strictly greater 
that 1 and the bound may be much worse than ([H)) and p7)) (when L'?^ grows 
significantly faster than a/T)- At the same time, this bound is much better 
when sa (there is at least e fraction of "perfect" Experts ). 

For the standard setting with the known number of Experts, other "small 
loss" bounds, of the form Lt < + 0(-\/L^), were obtained. The authors 
of [1] posed an open question whether similar bounds can be obtained if the 
(effective) number of actions is not known. We left the question open. 

4.3 Internal Regret and Time Selection Functions 

It was shown in [5] and in [7| that the loss bounds obtained by the DFA can 
be easily transferred to second-guessing experts and sleeping experts models. A 
second-guessing expert is a (known) function of Learner's decision. Informally, 
a second-guessing expert explains how Learner could improve (hopefully) his 
performance. Sleeping experts (or specialists) introduced in [3] may be inactive 
at some steps, abstaining from announcing their decision (a specialist may decide 
that the current problem is outside her expertize area). The regret of Learner 
to a sleeping expert is counted over the steps when the expert was active. 

The models similar to second-guessing experts and sleeping experts were 
studied in DTOL as internal (or wide range) regret and time selection (or activa- 
tion) functions respectively (see |12] for a review). The internal regret compares 
Learner's loss not to the loss of a fixed action, but to the loss of a modification 
rule of the form "Every time Learner selected action n he should have selected 
n' instead" (more formally, all the weight ^t,n assigned to action n should have 
been appended to 7t,„'). The wide range regret deals with more general modi- 
fication rules which may replace each action by some other action. Note that a 
fixed action n is also a modification rule that suggests to use n instead of any 
other action. 

A time selection function attached to a modification rule assigns a scaling 
factor from [0, 1] to each step. The regret of Learner to this rule is a sum of the 
regrets at each step weighed by these factors. This weight can be regarded as 
a degree of specialist's certainty: when the rule is known to be inapplicable for 
some reason, the weight is zero; and when the rule is partially relevant, the rule 
agrees for some partial responsibility only. 

As has been recently shown [12], an algorithm achieving in DTOL with 
N action some regret bound with respect to N can be transformed into an 
algorithm that achieves the same bound with respect to K for K modification 
rules with attached time selection functions. This gives the best regret bound 
0(VrinA"). 

We show how to extend the results of Theorems [8] and [9] to internal regret 
and time selection settings. We do not apply the general method of [T^], but 
directly modify our supermartingales and proofs. Remarkably, we need very 
modest changes. 

A modification rule is represented hy N x N stochastic matrix M: the matrix 
elements are non- negative and the sum of every column is 1. The (one-step) 
regret of Learner's decision 7 S Ajv to the modification rule AI on the outcome 
w S [0, 1]^ is 7 • oJ — (-^^7) • uj, where Af-f is the product of matrix M and vector- 
column 7. The total regret after step T on the sequence of outcomes wi,a;2, • ■ • 



17 



of Learner predicting 71 , 72 , . . . with respect to a modification rule AI [t) with 
attached time selection function lit) is 

T 

(cf. Rhjj in [12]). 

Remark 5. The definition above reflects a slightly more general notion of a 
modification rule, which allows, for example, the rules that mean "instead of n 
select at random n' or n" equiprobably" . Khot and Ponnuswami [12] do not 
discuss such rules explicitly, but it appears that their method works for them 
as well (unless we miss some subtlety in the proof). 

First let us obtain an analogue of Theorem [S] We formulate the bound with 
respect to the effective number of modification rules. It is very probable that 
the method of [12] also transforms a bound in terms of the effective number of 
actions into a bound in terms of the effective number of modification rules, but 
we did not check. 

Theorem 10. In DTOL with N actions, let us have K modifications rules 
Mk{t), each assigning a stochastic N x N matrix to each step t, with attached 
time selection Junctions Ik{t) assigning a number from [0, 1]. (The modification 
rule numbered k may arbitrarily change in time and may depend on the whole 
history, and so is the time selection function.) There is a strategy that achieves 
the bound 

Rip < 2^7^11^ +7\/r 

for any T and any e, where is a value such that for at least e-fraction of the 
rules the regret R^ of rule k after step T is not less than Rfp . 

Proof. The proof is very similar to the proof of theorem ^ The only change in 
the algorithm is that in ([T^ we replace (A(7, w) — X{'j^,uj)) = 7 • a; — a;„ by 
Ik{t){^ ■ uj — {AIk{t)j) ■ uj) and thus apply the same algorithm with 

k=l i=l 

X e(''/VT)(^fe(^)(T'3-(^^k(T)7)-<3))-(j/VT)V2 ^ (21) 

We need to check that the conditions of Lemma [3] are satisfied. It is enough to 
observe that I{T) < 1 and that exp ((«/^/^) ih{T){^ ■ w - {MkiT)j) ■ is 
convex in 7, then the proof of the Lemma [S] applies without changes. The loss 
bound is obtained as in Theorem |9l □ 

Theorem [8] can be adapted in a similar way. But we formulate another 
analogue of the theorem: The bound includes the total number of modification 
rules instead of the the effective number of them, but the regret of each rule k 
is bounded in terms of the actual activity time (or awake time) ^fc(^) of 

the rule, not the total time T. We do not know whether bounds referring to 
the awake time were explicitly stated anywhere; however, a bound of this kind 
can be obtained from bounds that depend on the loss of the rule (or action), as 
in [3J Theorem 16] or [TH Theorem 5]. 
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Theorem 11. In DTOL with N actions, let us have K modifications rules 
Mk{t), each assigning a stochastic N x N matrix to each step t, with attached 
time selection functions Ik{t) assigning a number from [0, 1]. The DFA achieves 
the bound 



'7 



where TkiT) = X^tLi -^k{t), for any T and k = 1, . . . , K . In particular, the above 
bound implies for any S € (0, 1/4) 



Rt < y n (T) In 1 + ir,. (T) In i + 2Tk (T) In In (T) 

+ max {4, 400 In A'} , 

which can be further reduced to 
1 



Rt< [^+ i^TkiT) ) v/2Tfe(r) In A' + 5Tk{T) InlnTfe(r) + O (InK) . 

Proof. We change the supermartingale used for Theorem [51 similarly to the 
proof of Theorem (TU] Namely, we apply the DFA to the supermartingale 



K , „l/e 



drj 



,r,fl^'„,-Tfc(t-l)r,V2 



X ^vIk(T)(i-uj-{MkiT}i)-Gj)-{vIUt)f/2 ^ ^22) 



Note that in contrast to the proof of Theorem \T0\ Ik (t) appears also in the 
"Hoeffding correction term" e"** The rest of the proof does not change 
much. To get the loss bound we observe that X]tLi(-^fc(*))^ — TkiT) since 

4(i)e[0,l]. □ 



4.4 A Toy Example of a Multiobjective Bound 

In this subsection, we discuss bounds with respect to two loss functions. In j^, 
we showed how to cope with several mixable loss functions. Here we combine 
a mixable loss function (the square loss) with a non-mixable one (the absolute 
loss). 

Let us describe an informal prediction setting where such a combination of 
loss functions can make sense. We want to predict the probability of rain. We 
have two groups of Experts. The first group consists of Mctofficcs that give the 
probability and evaluate the result according to the Brier (square) loss function. 
The second group is Simpletons, they give a boolean ('rain'/'no rain') prediction 
and count the number of errors (the simple prediction game). We must provide 
a pair, a probability and a boolean prediction, and the two components of our 
prediction must agree in the following sense: if we give probability of rain more 
than one half, we must predict 'rain'; if we give probability of rain less than 
one half, we must predict 'no rain'; only if we give the probability 1/2, we 
may choose the boolean prediction arbitrary (so we can randomize here). In 
the theorem below we bound both Learner's Brier loss and Learner's expected 
(over the internal randomizer) number of errors. 
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Theorem 12. Assume that we are given K Experts that give predictions p^ S 
[0,1] and M Experts that give predictions G {0,1}. Learner is allowed to 
give predictions {p,p) G [0, 1] x [0, 1], with the following restriction: if p < 1/2 
then p = and if p > 1/2 then p = 1. Then there exists a strategy for Learner 
guaranteeing for any sequence of outcomes a;i,a;2, • . . that for any T and for any 
k it holds 

T T ^ 

t=i t=i 

and for any T and for any m it holds 
T T 

E Ift - wtl < + wt] + 0(v/rin(ii' + M) + rinlnr) , 

t=\ t=\ 

where [6™ 7^ LOt\ ~ 1 if 6™ 7^ ujt and [&™ 7^ ujt] ~ otherwise. 

Proof Let A = {{p,p) £ [0, 1]^ | p = if p < 1/2 and p = 1 if p > 1/2 and } = 
{{p, 0)\pe [0, 1/2)} U {(l/2,p) I p e [0, 1]} U {{p, l)\pe (1/2, 1]}. We apply 
the DFA to superniatingale St on fl = {0, 1} defined by ([7]) with 



1 

fT{p,P,uj) = ^- J2e^T.Liiipt—'tf^{p'^~u,tr) X e2((p-'^)'-(PT-")') 



K + M 

k=l 



i V / ^vTl-A\Pt-u:t\-\bT^u^t])-vy2 ^^,^v(\p-u:\-\biP^u:])-vy2 

(23) 

and Gi^'K) ~ {{p,p) A \ p — 7r(l)}. To ensure that St is a supermartingale, 
we need to check that E^(|p - w| - [bip ^ uj]) < if (7r(l),p) e 0(71). Then we 
can refer to Lemma |6] and [5j Lemma 2] . 

Indeed, E^{\p ~ uj\ - [b^ lu]) = tt (1) {1 - p - (1 - V^)) + Tr{0) {p - bip) = 
(7r(0)-7r(l))(p~6™). If 7r(l) > 1/2 then 7r(0) < 1/2 and p = 1 > 6™. If 
7r(l) < 1/2 then 7r(0) > 1/2 and p = < bJp. If 7r(l) = 1/2 then 7r(0) 1/2. 
Obviously, in aU the cases (7r(0) - 7r(l))(p - bJp) < 0. 

The bound follows in the usual way (cf. Theorem [S]). □ 

Remark 6. Let us discuss how to find the numbers p and p such that /r(p, p, 0) < 
1 and /t(p,P, 1) < 1- Consider x € [0, 2] and two functions 

{x, if X < 1/2, 
1/2, if .xe [1/2,3/2], 
x- 1, if x > 3/2, 

and 

p{x) ~ min{l, max{x — 1/2, 0}} . 
Clearly, p(x) and p{x) are continuous functions of x. Let 

9{x,uj) = fT{p{x),p{x),uj) - 1 . 
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It is obvious that if g{xo,0) < and g{xo, 1) < then we can take p{xo) and 
p(xo) as p and p we are looking for. The supermartingale property of St and 
the definition of St imply that 



Substituting x = 0, wc get g{0, 0) < 0. Substituting x = 2, we get g{2, 1) < 0. 
If g{Q,l) < or 5(2,0) < 0, we can take = or = 2 respectively. 
Otherwise, consider the function 4'{x) = ^(a::, 1) — g{x,0). It is continuous, 
4>{0) > and (f>{2) < 0, hence there exists xq such that <f>{xo) ~ 0. Clearly, 
g{xo,0) = g{xo, 1) < 0. 
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